More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy.
نویسندگان
چکیده
The relative contribution of taxon number and gene number to accuracy in phylogenetic inference is a major issue in phylogenetics and of central importance to the choice of experimental strategies for the successful reconstruction of a broad sketch of the tree of life. Maximization of the number of taxa sampled is the strategy favored by most phylogeneticists, although its necessity remains the subject of debate. Vast increases in gene number are now possible due to advances in genomics, but large numbers of genes will be available for only modest numbers of taxa, raising the question of whether such genome-scale phylogenies will be robust to the addition of taxa. To examine the relative benefit of increasing taxon number or gene number to phylogenetic accuracy, we have developed an assay that utilizes the symmetric difference tree distance as a measure of phylogenetic accuracy. We have applied this assay to a genome-scale data matrix containing 106 genes from 14 yeast species. Our results show that increasing taxon number correlates with a slight decrease in phylogenetic accuracy. In contrast, increasing gene number has a significant positive effect on phylogenetic accuracy. Analyses of an additional taxon-rich data matrix from the same yeast clade show that taxon number does not have a significant effect on phylogenetic accuracy. The positive effect of gene number and the lack of effect of taxon number on phylogenetic accuracy are also corroborated by analyses of two data matrices from mammals and angiosperm plants, respectively. We conclude that, for typical data sets, the number of genes utilized may be a more important determinant of phylogenetic accuracy than taxon number.
منابع مشابه
Points of View
Taxon sampling is often thought to be of extreme importance for phylogenetic inference, and increased sampling of taxa is commonly advocated as a solution to resolving problematic phylogenies. Another solution is to increase the number of sites (by sequencing additional genes) sampled for each taxon. In an ideal world, one would like to increase samples of both taxa and genes, but taxon samplin...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملMagic bullets and golden rules: data sampling in molecular phylogenetics.
Data collection for molecular phylogenetic studies is based on samples of both genes and taxa. In an ideal world, with no limitations to resources, as many genes could be sampled as deemed necessary to address phylogenetic problems. Given limited resources in the real world, inadequate (in terms of choice of genes or number of genes) sequences or restricted taxon sampling can adversely affect t...
متن کاملMore taxa or more characters revisited: combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta: Lepidoptera).
A central question concerning data collection strategy for molecular phylogenies has been, is it better to increase the number of characters or the number of taxa sampled to improve the robustness of a phylogeny estimate? A recent simulation study concluded that increasing the number of taxa sampled is preferable to increasing the number of nucleotide characters, if taxa are chosen specifically...
متن کاملIs sparse taxon sampling a problem for phylogenetic inference?
As expected, with more data (total nucleotides) we were able to reconstruct more accurate phylogenies. When the number of taxa sampled per clade was increased, the interrelationships of those clades could be inferred more accurately. However, sampling in experimental design is only relevant in the context of resource limitation; therefore, to compare apples to apples, we used the number of nucl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Molecular biology and evolution
دوره 22 5 شماره
صفحات -
تاریخ انتشار 2005